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SYSTEM FOR CORRECTION OP SPEECH RECOGNITION RESULTS WITH CONFIDENCE LEVEL 
INDICATION 

The invention relates to a correction device for correcting text passages in a 
recognized text information which recognized text information is recognized by a speech 
10 recognition device from a speech information and which is therefore associated to the 
^eech information. 

The mvention further relates to a correction method for correcting text passages 
in a recognized text information which recognized text information is recognized by a 
speech recognition device from a speech information and which is therefore associated to 
15 the speech information. 

The invention also relates to a computer program product which comprises 
correction software of word correction software which is executed by a computer. 



^® Such a correction device and such a correction method are known e.g. from 

document US-A-6, 173,259. The known correction device is realized by means of a 
computer executing a word processing software of a corrector of a transcription service. 
The corrector is an employee that manuaUy corrects text information which text 
information is recognized from speech information automatically with a speech recognition 

25 program. 

The speech information in this case is a dictation generated by an author which 
dictation is transmitted to a server via a computer network. The server distributes received 
speech information of dictations to various computers of which each execute speech 
recognition software constituting a speech recognition device in this case. 

The known speech recognition device recognizes text information from the 
speech information of the dictation by the author sent to it, with link information also being 
established. The link information marks for each word of the recognized text information, a 
part of the speech information for which the word was recognized by the speech 
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recognition device. The speech information of the dictation and the recognized text 
infomaation and the link information are transferred from iJie speech recognition device to 
the computer of the corrector for a correction process. 

The known correction device contains synchronous playback means, by which 
5 means a synchronous playback mode can be performed. When the synchronous playback 
mode is active in the correction device, the speech infonnation of the dictation is played 
back while, in synchronism with each acoustically played-back word of the speech 
infonnation, the word recognized from the played-back word by the speech recognition 
system is marked with an audio cursor. The audio cursor thus maiks the position of the 

10 word that has just been acoustically played-back in the recognized text information. 

In the event of an unsuitable or incorrect recognized text passage picked up by 
the corrector, the unsuitable or incorrect recognized text passage is replaced with a 
different - correct respectively suitable - text passage. Such a correction work is extremely 
time-consuming, thereby considerably increasing costs of the transcription. On the other 

15 hand, if the quahty of the recognition and correction of the recognized text should be at a 
HMximum, the corrector has to listen to the whole sound respectively watch the whole 
recognized text. One of the aims, therefore, is to make the correction work following a 
recognition as rapid and efBcient as possible with an maximum quality of the recognized 
respectively corrected text 

20 



It is an object of the invention to provide a correction device in accordance 
with the type mentioned in the first paragr^h, a correction method in accordance with the 
type mentioned in the second paragraph and a computer program product in accordance 
25 with the type mentioned in the third paragraph with which the above-mentioned 
disadvantages and shortcomings are avoided. 

In order to achieve the above-mentioned object, in such a correction device 
features m accordance with the invention are provided so that the correction device can be 
characterized in the way set out in flie following. 

^ correction device for correcting text passages m a recognized text 
infonnation which recognized text information is recognized by a speech recognition 
device from a speech information and which is therefore associated to the speech 
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infonnation, the correction device comprising: reception means for receiving tiie speech 
information and the associated recognized text information and a linlc information, which 
link information at each text passage of the associated recognized text mformation marks 
the part of tiie speech information at which the text passage was recognized by the speech 
recognition device, and a confidence level information, which confidence level information 
at each text passage of the recognized text information represents a correctiiess of the 
recognition of said text passage and comprising synchronous playback means for 
performing a synchronous playback mode, in which synchronous playback mode during an 
acoustic playback of the speech infonnation tiie text passage of the recognized text 
information associated to tiie speech information just played back and marked by tiie link 
information is marked synchronously and comprismg mdication means for indicating tiie 
confidence level information of a text passage of tiie text information during tiie 
synchronous playback. 

In order to achieve tiie above-mentioned object, featiires in accordance witii tiie 
invention are envisaged m such a correction metiiod so tiiat tiie correction metiiod can be 
characterized in tiie way set out in the following. 

A correction metiiod for correcting text passages in a recognized text 
information which recognized text information is recognized by a speech recognition 
device firom a speech mformation and which is tiierefore associated to tiie speech 
information, in which tiie following steps are performed: receiving tiie speech information 
and tiie associated recognized text information and a link information, which link 
information at each text passage of tiie associated recognized text information marks tiie 
part of tiie speech information at which tiie text passage was recognized by tiie speech 
recognition device, and a confidence level information, which confidence level information 
25 at each text passage of tiie recognized text mformation represents a coiredness of tiie 
recognition of said text passage; perfoiming a synchronous playback mode, in which 
synchronous playback mode during acoustic playback of tiie speech infonnation tiie text 
passage of tiie recognized text mformation associated to tiie speech information just played 
back and mariced by tiie Unk information is marked synchronously; indicating tiie 
30 confidence level infonnation of a text passage of tiie text infonnation during tiie 
synchronous playback. 

In order to achieve tiie above-mentioned object, such a computer program 
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product includes features in accordance with the invention so that the computer program 
product can be characterized in the way set out in the following. 

A computer program product for a computer, comprising software code 
portions for performing the steps of the above-mentioned correction method when said 
5 product is run on the computer. 

By virtue of the characteristic features of the invention, it is achieved in a 
relatively simple way that for example a corrector of a transcription system using a 
correction device accordmg to the invention is able to make a correction work foUowing a 
recognition relatively rapid and efficient thereby ensuring a best quality of the recognized 
10 or corrected text mformation. In particular by means of indicating the confidence level 
information of a text passage of the recognized text information during the synchronous 
playback rather then as an at once and permanent indication of the confidence value of all 
text passages of the text information has the advantage that the corrector can easily 
recognize a wrong or incorrect text passage without being diverted or concentrated on the 
15 permanent indications. 

Li the embodiments according to the invention, it has been proved to be 
advantageous when measures as claimed in claim 2 and claim 7 are provided. The corrector 
does not only focus on individual passages, but on the whole document, thereby 
guaranteeing higher quality and accuracy. 

embodiment according to the invention the indicating of the confidence 
level infomiation of a text passage of the text mformation may be perfonned acousticaUy. 
In the embodiments according to the invention, it has proved to be very advantageous when 
measures as claimed in claim 3 and claim 8 are provided. The visual feedback serves as a 
signal, a means of increasing the attention on a particular text passage to the corrector. 

It ^ further proved to be very advantageous in the embodiments according to 
the invention when measures as claimed in claim 4 and claim 9 are provided. By changing 
the speed of the playback for a particular section of the dictation automatically in 
dependence of the confidence level information, the attention of the corrector is increased 
resulting m an increased accuracy of the corrected text information. For example, an 
30 automatic slow down of the playback speed may be performed for a text passage with a 
lower confidence level. 

In the embodimaits according to the invention, it has further been proved to be 
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advantageous when measures as claimed in claim 5 and claim 10 are provided. By this the 
accuracy of the corrected text may further be improved. 

The invention will be better understood according to the following description 
explainmg the physical basis of the invention based on the enclosed drawing showmg a 
preferred embodiment of the latter as a non-limitative example of implementation. 



10 



Figure 1 shows, in accordance with this invention, a correction system in form 
of a block diagram. 



Figure 1 shows a correction system 1 which comprises a computer la. By 
means of the computer la speech recognition software and text processing software is 
executed. The correction system 1 has a speech signal input 2 and mput means 3 and a foot 

15 switch 4 and a loudspeaker 5 and a screen 6 connected to it In this case the input means 3 
are realized by a keyboard and a mouse. 

A speech signal SS is received at the speech signal input 2 and transferred to a 
speech engine 7. The speech signal SS in this case is a dictation received from a server via 
a netwoik (not shown). A detailed description of receiving such a speech signal SS can be 

20 derived from document US 6,173,259 Bl, which document is herewith incorporated by 
reference. 

The speech engine 7 contains an A/D converter 8. By means of the A/D 
converter 8 the speech signal SS is digitized, whereupon the A/D converter 8 transfers 
digital speech data DS to a speech recognizer 9. 

Th® speech recognizer 9 is designed to recognize text information assigned to 
the received digital speech data DS. In the following said text information is referred to as 
recognized text information RTI. The speech recognizer 9 is further designed to estabUsh 
link information LI which for each text passage of the recognized text information RTI 
marks the part of flie digital speech data DS at which the text passage has been recognized 
30 by the speech recognizer 9. Such a speech recognizer 9 is known, for example, from the 
document US-A-5,031,1 13, the disclosure of which is deemed to be included in the 
disclosure of this document by this ref^nce. 
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Those sldUed in the art will appreciate that the information provided by the 
speech recognizer 9 for each recognized text passage can be statistically analyzed. In 
particular, the speech recognizer 9 can provide a score indicative of the confidence level 
assigned by the speech recognizer 9 to a particular recognition of a particular word. These 
5 scores are analyzed by a confidence level scorer 10 of the speech recognizer 9. In the 
following said scores axe referred to as confidence level information CLI. 

The speech engine 7 also comprises memory means 11. By means of said 
memory means 1 1 the digital speech data DS transferred by the speech recognizer 9 are 
stored along with the recognized text information KTl and the link information LI and the 
10 confidence level information CLI of the speech signal SS. 

The correction system 1 also comprises a correction device 12 for recognizing 
and correcting wrong or unsuitable recognized text or words. The correction device 12 is 
realized by the computer la processing the text editing software, which text editing 
software contains special correction software for correcting text passages of the recognized 
text information. Correction device 12 is further referred to as correction software 12 and 
contains editing means 13 and synchronous playback means 14. 

The editing means 13 are designed to position a text cursor TC at a text passage 
that has to be changed or an incorrect text passage of the recognized text information RTI 
and to edit the recognized text passage in accordance with editing information EI entered 
by a user of the correction system 1, which user is a corrector in this case. The editing 
information EI in this case is entered by the user with keys of the keyboard of the editing 
means 3, in a generally known maimer. 

The synchronous playback means 14 are allowing a synchronous playback 
mode of the correction system 1, in which synchronous playback mode the text passage of 
the recognized text information RTI marked by the link information LI concerning the 
speech information just played back is synchronously marked during an acoustic playback 
of the speech information of the dictation. Such a synchronous playback mode is known, 
for example, from the document WO 01/46853 Al, the disclosure of which is deemed to be 
included in the disclosure of this document through this reference. 

When the synchronous playback mode is active, audio data of the dictation 
which is stored in the memory means 11 as digital speech data DS can be read out by the 
synchronous playback means 14 and continuously transferred to a D/A converter 15. The 
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D/A converter 15 ttien converts the digital speech data DS into speech signal SS. Said 
speech signal SS is downstream transferred to the loudspeaker 5 for acoustic playback of 
the dictation. 

To activate the synchronous playback mode, the user of the correction system 1 
5 can place his foot on one of two switches provided by the foot switch 4, whereupon control 
information CI is transferred to the synchronous playback means 14. Then the synchronous 
playback means 14 in addition to the digital speech data SD of the dictation also read out 
the Unk information LI stored for said dictation in the memory means 11. 

In synchronous playback mode, the synchronous playback means 14 are further 
10 designed to generate and transfer audio cursor information ACI to the editing means 13. 
Immediately after the activation of the synchronous playback mode the editing means 13 
are designed to read out the recognized text information RTI from the memory means 1 1 
and to temporarily store it as text information TI to be displayed. Said temporarily stored 
text information TI to be displayed corresponds for the time being to the recognized text 
15 information RTI and may be corrected by the corrector by corrections to incorrect text 
passages in order to ultimately achieve error-free text information. 

The text information TI temporarily stored in the editing means 13 is 
transferred from the editing means 13 to inrnge processing means 17. The image processing 
means 17 process the text information TI to be displayed and transfer presentable display 
20 uiformation DI to tiie screen 6. Said display information DI contains tiie text information 
TI to be displayed. 

As ahready mentioned, the display process is windows-based. For the user the 
foUowing is recognizable during the synchronous playback. Primary a window on the 
screen or display is filled witii the recognized text. The recognized word corresponding to a 

25 speech segment respectively the audio data which is played back as already mentioned 
above is indicated by high-lighting the word on the screen. As such, the high-lighting 
follows the play back of the speech. 

In the embodiment shown in figure 1 the editing means 13 contam indication 
means 16. The indication means 16 are constructed for indicating the confidence level 

30 information CLI of a text passage of the text information TI to be displayed during tiie 
synchronous playback which confidence level information CLI is received from the 
memory means 1 1. In this case flie text passage is a single word. It may be observed tiiat 
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the confidence level of so caUed bigrams or trigrams or phrases of the recognized text 
information may be indicated. 

It may further be observed that the indication means 16 may be a separate block 
within the correction device 12 being connected to the editing means 13 and/or the 
5 synchronous playback means 14 and receiving confidence level infonnation CLI and audio 
cursor information ACI and recognized text information RTI and outputtmg text 
infonnation TI with a confidence value indication. 

In the present embodiment, the indication is perfonned by applying a color 
attribute to each word which is currently "active" in the synchronous playback which 
10 means the word which is played back A threshold level respectively a confidence limit is 
settable before starting the synchronous playback mode. The confidence limit may lie. for 
example, at 80% of a maximum confidence value range of the confidence level information 
CU stored in the memory means 1 1. Accordingly, for each "active" word an inquiry takes 
place as to whether the confidence level information CLI of said word is smaller, equal to 
or greater than the threshold level. If the threshold level is undershot or equaled, the 
"active" word is marked respectively a color attribute different to a default color attribute is 
assigned resulting in a different color high-lighting on screen 6. 

Being notified about the confidence level of a word of the text information TI 
just during the synchronous playback rather then as a pennanent indication of the 
20 confidence value infonnation CLI of all words in the displayed text infonnation TI has the 
advantage that the conector can easily recognize a wrong or incorrect word without bemg 
diverted or concentrated on the pennanent indications. 

It may be observed that other visual indications may be used to indicate a 
confidence level infonnation CLI of a word when synchronous playback takes place, for 
example, the word may be show bold or underlined. Furthennor^, instead of marking the 
word, a separate indication at the text-window may be provided in the fonn of a flash-light, 
which flash-Ught indicates the confidence level infonnation CLI respectively the 
confidence value of the "active" word. By this, a corrector just needs to concentrate at the 
flash-light in a fixed position rather than - in synchronous playback mode - foUowing the 
30 "active" words in.the text displayed and/or highlighted on screen 6. 

Since a playback speed in synchronous playback mode may be comparatively 
fast, the playback speed may be changed automatically in dependence of the confidence 
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level. For example, the playback speed for a word with 80% of a maximum confidence 
value may be reduced by half of fhc normal playback speed of a word with the maximum 
confidence value, thus correctly recognized. 

It may further be observed that the indicating of the confidence level 
5 information CLI respectively the confidence value in accordance with the invention may be 
performed acoustically. In this case a sound signal may be generated and emitted via a 
loudspeaker. A different pitch or a different loudness or volume of the generated sound 
signal may be used to indicate a different confidence value. 

It may be observed fiirther that the indicating of the confidence level 
10 infonnation CLI respectively the confidence value in accordance with the mvention may be 
performed by means of vibrations. In this case additionally vibration means are provided 
which vibration means can be brought into a contact with the user respectively corrector 
and in which the corrector may feel or sense vibrations in dependence of the confidence 
value of a word played back in the synchronous playback mode. 
1^ As ahready mentioned the correction system 1 is implemented on a 

conventional computer, such as a PC or workstation. It should be mentioned that portable 
equipment, such as personal digital assistants (PDAs), laptops or mobile phones may be 
equipped with a correction system and/or speech recognition. The functionality described 
by the invention is typically executed using the processor of the device. The processor, 
20 such as PC-type processor, micro-controller or DSP-like processor, can be loaded with a 
program to perform the steps according to the invention. Such a computer program product 
is usually loaded fi-om a background storage, such as a hard disk or ROM. The computer 
program product can initially be stored in the backgroimd storage after having been 
distributed on a storage medium, like a CD-ROM, or via a network, like the public internet. 



